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Summary: An Essay on Aggregation Theory and Practice 
Edwin Kuh 


This paper shows how aggregation can improve the efficiency of 
econometric estimation. The new theorems depend on the random coef- 
ficient regression model, Theil's aggregation analysis, and assumptions 
about the stability of shares held by individual economic entities in 
Macroeconomic totals. Section 1 summarizes existing views held by econ- 
ometricians about aggregation problems and their solution. Sections 2 
and 3 offer an exposition of existing aggregation analysis and the basic 
new theorems which show precisely how the macro parameter variances will 
tend to zero as the number of elements in the aggregate increase. Section 
4 presents a simple investment model estimated from data for 105 manufac- 
turing firms divided among four industries, to test underlying assumptions 
and also to test directly predictions of aggregation gain generated by the 
theory. The test outcomes are sufficiently favorable so that the exist- 
ence of aggregation gain and its measurability seem now to be clearly 
established. Section 5 takes up the question of potential aggregation 
gains for all Census four digit industries. Section 6 discusses 
the implications for research strategy when micro coefficients differ, 
contrary to a basic assumption of the random coefficient model employed 


throughout this paper. 
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An Essay on Aggregation Theory and Practice 


Edwin Kuh 


1. Introduction 

One of the most perplexing problems in the design of econometric 
research, and one that has remained largely unsolved is how aggregated 
data can be in order to obtain valid estimates. The main purpose of 
this paper will be to show that aggregation gains not only exist, but how 
one can provide operational tests of their extent. 

Theil's aggregation theorems (21] show that behavior as reflected 
in postulated population regression coefficients must be nearly identical 
for all members in a given population if estimates based on aggregated 
data are to reflect behavior in a meaningful peneeee These implications 
of Theil's analysis have recently heen reinforced by Orcutt, Watts and Ed- 
wards [15] whose simulation results suggest that estimates from aggregate 
data generated from one particular simplified macro-economic model can be 
inefficient and sometimes inconsistent, even when the micro parameters are 
identical. There exists a conflicting position, composed of three strands, 
that is favorable to greater aggregation. First, Griliches and Grunfeld [6] 
showed that error variances for aggregate regression equations will be less 
than the sum of the micro variances if negative error covariance terms are 
sufficiently great; furthermore, costs are normally less for estimates 

“Qarieein special data configurations permit heterogeneous coefficients, 


yet avoid what Theil calls contradictions between the micro hypothesis 
and the macro hypothesis. 
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based on macro data. They restricted themselves to a simplified aspect 

of forecasting, neglecting altogether the problem of aggregation bias. Second, 
there is a metaphysical view most vigorously espoused by Friedman and Meiselman [5] 
that the relevant behavior of the economy can be most efficiently rep- 
resented by a single aggregated model; "truth will out" even in the 
aggregates, so that complex, disaggregated models of behavior are in- 
effectual ways to test macro economic propositions. This view is a 

matter of faith unrelated to statistical hypotheses. The third prag- 

matic view of perhaps a majority of practicing econometricians is that 

micro data probably ar@better, but that it is either unavailable or 
excessively expensive, so that all one can do practically is use macro 
data. 

From a somewhat different aspect, many cross-sectional studies based 
on micro data (individual firms, families or individuals) have proven so 
disappointing that many researchers prefer to avoid this data base. The 
major source of failure is the high error variances i.e. low correlations, 
even though the statistical significance tests may be highly affirm- 
ative in the sense of decisively rejecting the no-relationship null 
hypothesis. Yet it is exceedingly cold comfort to find a "very sig- 
nificant" multiple correlation which in a large sample may explain no 
more than 5% or 10% of the dependent variables fluctuations, so that the 
systematic component of the explanation, hate watendtelean eh is negligible. 
I have shown [13] that astrong presumption exists that a component of 
cross-sectional error variances can and should be imputed to individual 
effects which vary across individuals but are constant over time. The 


basic problem of explanatory power still remains, however. 
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Finally, as with Goldilocks, something that seems "just right" can 
be found in sub-aggregates such as two digit industries, or sub-catagories 
of the labor force broken down by age, sex,and skill. While the last re- 
mark is mere obiter dictum at this juncture, an analytical basis for it 
will soon be provided:that is the central purpose of this paper. In short, 
"proper" aggregation is not merely neutral, nor just a major cost saving 
(which it usually is), but proper aggregation causes major reductions in the 
estimated parameter error variances, relative to the underlying, individual 


error variances. 


2. One Resolution - A Sketched Proof 


Because the theorems advanced later necessitate an involved notation, 
even though the results are straightforward, it is convenient to introduce 
the notation here in the context of a simplified example. The basic 
analytical tools are twofold. The first is the random coefficient model, 
that was brought to the attention of practicing econometricians by Klein [10 ] 
in the context of cross section models, much as wolare doing here. The 
complex maximum liklihood estimator proposed by Klein discouraged subsequent 
application. Nevertheless, the random coefficient model itself has much 
to recommend it. Casual introspection (the kind most of us excel in) suggests 
that in making our individual consumer expenditure decisions, we do not go 
through an exact calculation involving many relative prices and income and 
then tack on a random error to this painstaking result; rather, we often 
buy haphazardly, on impulse, even though on "average" our purchases do re- 


flect more basic consumer preferences and economic parameters. This latter 


= f= 

behavior is more compatible with a random coefficient model than it is 

with the standard shock model. The random coefficient model, whether or 
not it is inspired by the motivational scheme above, also helps to ration- 
alize the dilemma posed by the extremely low variance explanation of so 

much cross-section analysis. The basic economic behavior is stable 

and explicable on the average, even though it is not in one given time slice 
when the variances of the individual coefficients are large. Indeed, a re- 
lated alternative explanation is the errors-in-variables model where the 
argument has been made that cross-sectional consumption and income data 
contain transient i.e. error components which will lead to biased least 
Square estimates. The effects on error variances and estimated standard 
errors of the regression coefficients have normally been given less attention. 

C. R. Rao [16] has shown that ordinary least squares is an unbiased, 
efficient estimator in the random coefficient case. That article contains 
the statistical basis for the application of ordinary least squares in the 
aggregation context, to which we now turn. 

Theil's aggregation theory[21], the second analytical device on which this 
paper depends, has been translated into compact matrix notation by Kloeck[1l1]. 
I shall reproduce the main results (not complete proofs) in a slightly extend- 
ed notation. 

We will suppose that there are N individual behavior equations of the form: 


ad x85 af E, aL eS 5 Booool (1) 
where y, is a column vector with T rows for the ee individual ; x5 isa 

(T x K) set of exogenous variables for the ith individual, 8, is a (K x 1) 
parameter vector for the jth individual, and e, is the column vector of 
errors for the ith individual.* Corresponding to these 


* 
The time subscript is inessential in what follows, so we have 
suppressed it from the outset. 
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micro equations is the macro equation: 


Y 


i} 
cas 
+ 
(0) 


(2) 

where the variables X and Y are aggregates of the microdata x5 and Yae In 

the context of Theil-type aggregation, 8 is defined as the mathematical ex- 
pectation of the least squares estimator. When each micro series for each exogenous 
variable is regressed on all the macro exogenous variables (a useful step essen- 
tial to a clear interpretation of the macro parameters in terms of the micro 
parameters) the resulting regression coefficients will be called the ag- 

gregation weights. These are presented in (3). The precise nature of 


these weights will be made clear in the example which follows immediately. 


GDA ©) (K) Bs 
rs Bin og OLE ) m = tbo Zoooalk (3) 
This vector obviously has K components and for each individual there 


exist K such vectors. 

Theil has shown (using a modification of Kloeck's notation) that a 
typical macro coefficient Bo is a B-weighted sum of the micro parameters, 
where the aggregation weights have certain properties, e.g. a typical 


Macro parameter is: 


g Q Q Q Q Q Q \ 
a, = 220s, - Pa ...2! Baa tee Bg Br [Bre \C4 
i \ 
j Piz) [Ne | 
liga: | | 
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There would be K such equations, one for each macro parameter. 
The B aggregation weights satisfy the following conditions as a direct 


implication of least squares: 


(2) ie 
121 BD = 1 for p= & 


= 0 for p # 2 


(5) 


In words: the weights for corresponding parameters sum to unity, and sum 
to zero for non-corresponding parameters. 

Having set out much of the general notation, I wish to illustrate the 
main point of this article with a simple example. We wish to estimate 


the parameters of the following macro equation: 


= (X:X2)@Bil + ce (6) 
Bo 
N 
Be p(t) (1) 
Bi =42) Bis aha Py By, 42, ByenB io 
(7) 
(2) aCe) st (OD) 
By 2 By 8, = 42735) 84,+ eaBGn By, 


Suppose now that the micro parameters are mutually independent identically 
distributed random variables. Then the expected values. of the macro 


parameters: 


E(B,) = EB) E(B, ,) = Beto and similarly for 82. (8) 


This result follows immediately from the fact that the aggregation 
weights are fixed numbers which sum to unity for corresponding parameters 
and zero for noncorresponding parameters.* Theil, in a related context, 


Gennes [22] proved this same proposition. A concluding section re- 
lates the material of this paper to papers by Theil[20] and Zellner [22]. 
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variates. It is worth quoting his observations on this matter in full: 
These results indicate that there are no problems 

of aggregation bias if one works with a randem-coefficients 

micromodel. In fact, the macrocoefficients contrary to 

the microcoefficients are not random at all when N is suf- 

ficiently large. Note, however, that the assumption of non- 

stochastic x's is not at all innocent. If we select 

households at random, not only their 8's but also their 

x's become stochastic. This does not mean that we cannot 

treat the x's as if they are fixed. But it does mean that 

if we do so, we operate conditionally on the x's andwe assume 

implicitly that the conditional distribution of the B's 

given the x's is independent of these x's. Thus, the analysis 

is based on the condition that over the set of individuals who 

are aggregated, there is stochastic independence between the 

factors determining their behavior (the x's) and the way -in which 

they react given these factors (the B's).* 
The great convenience of the random coefficient model in the aggregation 
context (initially recognized by Zellner [27], see the last footnote) is 
that it allows individual parameters to be different at each point of 
time, which seems realistic, yet be the same on average---a necessary 
condition for successful aggregation which should be realizable through 
appropriate grouping of the data. 

The main contribution of this paper is to show that the variance 
of the estimated macro coefficient goes to zero as the number of individuals in the 

aggregate increases, although this result would appear to be counterintuitive 

since the variance of the sum of independently distributed random variables 
tends to infinity as the number of items in the sum tends to infinity. 
Thus the efficiency of estimation increases as the aggregate grows. In 
one limiting case, the variance diminishes as 1/N, i.e. it is as if the 
parameters were averaged instead of being made up of aggregated data for 


which the underlying error variances cumulate toward an indefinitely large 


sum as they are subsumed in the aggregation. 


*Theil ,[20], pp.5-6. 


=m 
The variance of 6, can be written by inspection of the first row in (7) 


as: 


a2 Cy? SOOAG a 


Vy (81) = G ; Bi Ee BAAD (9) 


Jot +¢2,86 ) ofoC,Er8 


where G “is the variance of the first micro parameter, g,*is the variance 

of the second micro parameter and 0)2 their covariance assuming the micro parar 

meters to be identically and indevendently distributed for all N individuals. 
We must now evaluate the nature of the B weights more closely, for 

it is their behavior that basically determines what happens to the 


(1 _ 


variance of the macro parameter as N increases. The B; which sum to 


one by the aggregation weight rules, are the partial regression coefficients 


AOD oy Dee 


thy, * i ‘ 
of the i micro series Xs on the macro series x in x 


ii Bia xX, ij 2 
and the BU ) are the partial regression coefficients of micro series Xi, 
on macro series X, in the gen multiple regression Xi. = Bok + Bae X, 


While it is true, as Theil and Kloeck have emphasized, that these are 


arbitrary weights, it is nevertheless reasonable to suppose that under a 


(1 ) 


wide variety of circumstances that the Be will be approximately the 


proportion of X, that originates with x One rigid assumption with 


athe 
many interesting implications is that these are exact proportions, a mat- 


ter to which we return in later empirical sections. Furthermore, one 


(1) 


would suppose that the BS weights would tend to be small, much smaller 


(1) 


than the Bei: The arbitrary nature of the entire estimation procedure 
for the B's leads one to suppose that the net regression coefficient of 


ee on X, can be expected to me small term by term. Furthermore, we 


know that .2 §) = 1 and z De ae 0. 
i=1 il 


While in theory the individual B's could be anything, we shall 


suppose that in most empirical applications that the BQ) are positive 


=O= 
(1) 


fractions, while the B., s may be positive or negative, but are consid- 


(1) 


erably smaller than the Bi, Ss. 
From this assumption the following highly significant and equally 
simple assertion emerges: 
The share of each component in an aggregate will remain unchanged 
or decrease when the aggregate increasesyand the corresponding B 
weights will behave in corresponding fashion. The sum of squares, 


N 
5 pil)2 


218: will thus gradually decline as the squared fractions and 


their sum decrease. 
Consider the case where the following simplified (but illuminating) 


conditions are correct: 


(1) 


(a) The Bey weights represent shares which are equally divided 


among members in the aggregate, 


(1) 


(b)The Bio 


are individually negligible or the variance of Bs» is 


negligible relative to that of Bey and similarly for the co- 


variance o }2 among the micro parameters Bis Bios 
Then, 
N N 
~ Gh) 25a S ILA 63 2 ao gh 
VB n =) (zara): = (Ci (=) © =—g> 
N leat) Ban i= TN Mn A Baa (10) 


In this illustrative case, the variance of the macro parameter approx- 


imately obeys the law of large numbers, decreasing as i.e. as if we 


W? 
were averaging parameters of individually distributed variables rather 
than taking weighted sums of independent random variables. The more 


unequal the weight distribution, the more slowly does the variance of 


the macro parameter decrease, but it will decrease. Empirical results 


=10= 
confirm that, even with skewed distributions that exist in many manu- 
facturing industries, the sums of the squared shares to which correspond- 
ing B weights are close analogues tend rapidly to very small magnitudes 


in most four digit industries. 


The remainder of this paper proceeds as follows. A general proof 
for the full regression model is presented. Then several sets of empirical 
results will be examined to validate or refute the basic postulates which, 


after all, are assertions about measurable reality. 


3. General Proof 


Let for i = 1, 2,...,N individuals and p = 1, 2,...,%,..., K parameters: 


$ t 
By GC eee ae (11) 
Cy (L)- (2) (2) 
Be [By Be, 120 Buy Ne (12) 
Tat) 2 Gb) (1) 
| Bi Ta ag asonjite 
B | 5 : ee (13) 
ie (K) aK 
Bl) ae oe By} 
N 
2 (2) 
alSO By li 184 a (14) 
G 
mail BS (Bioyoseda- (15) 


Here 8. is the (K x 1) vector of the micro parameters for individual i, 
is the (1 x K) vector of the B-weights for individual i and macro parameter 
£, B is the (K x KN) matrix of all B-weights, Bo is the typical macro 


parameter and finally, 8 is the (K x 1) vector of macro parameters. 
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We make the 


Assumption: {B. > i= 


1, 2,...,N} is a sequence of mutually independent 


identically distributed (K x 1) random variables, each with mean 


vector 8 and positive definite symmetric covariance matrix 2. 


Then E(8) = BE(G’,--2Gy°j'=BIB 8 -..-8'1 =8 


so that the estimator is unbiased. 


(16) 


Furthermore, 8 has covariance matrix 


is r (1)5,(1)t (1) «De 
lo } De PBT” “ccoopecnc Be 7OB) ye 
Q eae ; 
i) SB 
| t I 
| & | - pgp (Wt 
| 2 =a a 
ze 


can be written as: 


N N 
zi (2) Cove A (2)t, (2) 
Dy rca By gto L212 BR, By) 
bas rp) 2 : CO) | 
(pets gr i=1°i, “ik , 
| 
ie sel z 1 0 it) 
= tr Q:| N ae tr/2 a 5) 
izy'Biz ) 


(18) 


(19) 


It must now be shown what restrictions on the B-weights will lead 


to a decrease in the macro parameter variances as the aggregate population 


grows. 


A lemma concerning a sequence of fractions which sum to unity is 
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needed at this point: 


Lemma: 


Proof: 


N 
i41 


Clearly, 


By hypot 


Forpiv= ie 2)aenNi and) N= 1.) 25 .eealet {py} be such that 


N 
= < Pan < il and 421? in = 1. Then a necessary and sufficient 
condition that 
i 
Ze wOnismehat 


N+ i#1 Pin 
lim Oe 0 where Py = max {|Piyl> sb SS ly Doocolll} 


Necessity is obvious. To prove sufficiency, examine the sum 


Bes Wi i i _i 
(Pay 7 2? ~ a21 Cin Sy? Pin 7 
N N 1 
_i . 2 (pS te) 
Pn 2 |lPersil N for each i, so i221 (Pin cs N2) < Pry 121 iN N 
hesis lim Pn + 0, so that 
N 


lim 


N 5 
, lim 


Spee IW hE 2 es 
noe tn Oly ae ope oO Bape lS 


N 
But the minimum value of 421 Dane is i, implying for each N that: 
N N 
Gi) ORE lim ye bin 
ei Pa epee Pb Ses ee, oo A Day Se i 7 
This lemma enables us to prove Theorem l. 
Theorem 1. Provided that: 
N 
(2) (2) _ 
(1) Beg |< 1 and ap ES Me 
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Q 
(2) When j # &, In Iloyy | and 


(OR ee 6 
(3) ne) max {B59 6) He Ny BoccoonNl: = 


Nee Ne ie 
lim (haa dition (3) and the Lemna. 
Proof: that Soe: iZ1 [Boo ]* = 0 follows from con 
N Se qe) 
L Saya : (R)j2 < 5 pi”?]?, convergence 
Since Dea | implies that 121 '855 ] seit id ; 


of the right hand side of the inequality to zero as N > © implies that all 
i & . hy-Schwartz 

diagonal terms in a ) converge to zero as N* © By the Cauchy 

LO 

N 


inequality convergence of all diagonal terms in to zero as N*© 


lim a” = Q and hence Vy (B89) +0 asN2>™. Since for each 


implies that We 


lim = c= d 
N, Vy (8) is positive definite symmetric, aS © Vy (89) S 0) fore YS 5 Zoooon 


implies that ae GAR) 2 Os 


Some few comments may be in order on conditions 1 - 3 of the theorem. 
Much of the motivation for them has been provided in the previous section 
and empirical support will be forthcoming in subsequent sections. That 
all B weights are less than 1 in absolute value follows from the generally propor- 
Mee ike character of the corresponding B weights; 2% Be = 1 follows 
identically from the least squares restriction. 
The second condition, that non-corresponding B weights be smaller 
than corresponding B weights is an implication of the greater systematic 


iQ Q 


evidence supports this presumption. Finally, the last and most critical 


correspondence that x., has with X, than with Ba Available empirical 


assumption for the Lemma and hence for the proof, depends on how increas- 


ingly large aggregates are made up. Suppose that the Baa are proportions: 


SAS 
then, by construction, the addition of another amount decreases the propor- 
tional share of every single antecedent member of the aggregate. Unless 
the aggregate happened to be constructed systematically to make the Noa 
member a larger fraction than the largest preceding one, the condition 
will be satisfied. Aggregates are usually built up in two ways: by 
combining various sub-poputations (e.g. two digit into one digit industries), 
or expanding a given sample. In the first case, one can only suppose that 
the share of U.S. Steel in total manufacturing is less than it is in the 
steel industry. In the second case, supposing sampling to be random, 
the share of each member can be expected to halve, if the sample is 
doubled, irrespective of the shape of the underlying population. In general, 


then, we suppose that the B weights will behave in similar fashion and 


that this condition will hold in the vast majority of cases. 
The preceding theorem and discussion lead directly to an extension of 


* 
the assumptions to permit unequal variance matrices. When the 8. are 


mutually independent but do not have common variance matrices, setting 


V@.) = or, we may rewrite (17) as 


Be Re coaue One| 
a = + =i = 


: . (20) 
piK)9 (i), (Ot 
al — jb 


“The task of also proving that the limiting variances are zero when 
the individual parameter variance matrices are correlated has not been 
attempted. One may conjecture that when the tedious algebra is done, the 
basic results obtained here will not be altered. Highly complex product 
chains of fractions under the assumptions I have used throughout will surely 
converge to zero in the limit. Swamy [19] derived related results in the 
heteroschedastic case to be considered in Theorem 2. 
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and (18) as 
N 
39 (Dy Oe mn Gis) pi tp (2) 
Vv whe) = 4218. 5 7 = tr, DT Q B 3B (21) 
Let g® be a generic element of OD and € = max (QQ | Dyn=mtalecds 2 eavenote (NIT 
re = rc re 


169) 
iH 
_ 
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N 
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Provided that elements of are bounded and that conditions (1)-(3) 
GG). Be 
hold, the proof that Vy BQ) +90 asN>owhen Q =Q evil sh, 4] 
applies in this more general case. This is a direct consequence of the fact 
ee (e) = OD 
that if © > Q as N*> ©, so does tr Ee, . Also, the Cauchy-Schwartz 
inequality again implies that if all diagonal terms of V. (8) approach zero 


as N approaches infinity, so do the off-diagonal elements of Vy) - Hence 


we have: 


Theorem 2: 
Let {B, i=1, 2,..., N be mutually independent random vectors with 
> 


common mean 8B, but not necessarily identical positive definite symmetric 


variance matrices V(B.) ah), ie os Di aver 


56 Me If conditions (1), (2) 


i lim 
and (3) of Theorem 1 obtain, then es ey a 


4. Empirical Evidence -- Firm Investment Relations 


Manufacturing investment equations will provide some evidence on 
the underlying assumptions and the aggregation properties contained in 
the basic theorem. I have chosen investment behavior for several reasons, 
not the least of which is an earlier interest in its substantive aspects 
as well as its aggregation characteristics [13]. In addition, the earli- 
est empirical study of aggregation theory was that of Boot and de Wit [2] 
using investment behavior and based on a study by Grunfeld [7]. Among 
the most recent contributions to testing alternative models of investment 
behavior, research by Jorgenson and Siebert [9] placed heavy reliance on 
individual firm investment equations. In nearly all the studies cited, a 
basic equation form was the simple Chenery-Koyck distributed lag [3][12] 
in which investment is a linear function of sales or output, and the lag- 
ged capital stock. 

Since my present aim is to study aggregation, I shall adopt the Chenery- 
Koyck formulation without theoretical motivation (available however, in the 
sources cited) and without a detailed critique of its merits. The basic 
data have similar origins to the previous firm studies, except that none 
of the series on investment, sales and gross fixed assets were price cor- 
rected. The cost of rectifying the data did not seem warranted in light 
of the predominant methodological thrust of this study. There is little 
reason to believe that failure to correct for price variations will ser- 
iously affect the aggregation characteristics of the underlying series, 


although comparability with previous studies is diminished. 


=j= 


The raw data came from the COMPUSTAT tape which contains financial 
information from 868 companies listed on the New York Stock Exchange, Amer- 
ican Stock Exchange, and, in a few instances, regional exchanges or those 
with securities traded over the counter. While in most cases the num- 
bers are identical with corporate records, some disparities exist. Accord- 
ing to the COMPUSTAT manual [18]: 

Differences will frequently occur between the financial 

Statistics included in COMPUSTAT and those in Corporation Re- 

cords. These differences may be due to one of the following 

considerations. 

Some restatement of company-reported information 
has been made in COMPUSTAT in accordance with the 
definitions outlined in this manual for the purpose 
of increasing comparability, both within a single 


company and within a single industry. 


COMPUSTAT includes some material taken from SEC 
and outside sources.... 


Although almost any item of information in COMPYSTAT may 
differ from that in the Annual Reports in accordance with the 
definitions given, a few specific notes are listed below in 
some of the most frequently used statistics in the system.... 

Annual data in COMPUSTAT for years prior to a merger are 

not stated on a pro forma basis (see page 3-1). Both 

pro forma amd reported data are normally given in 

Corporation records. 


Employment and capital expenditure data may vary 
slightly because of differing sources of information.... 


From this larger universe, several industries were selected whose 
aggregation properties will be reported. These were chosen on the basis of 
size and relative internal homogeneity (two digit industries are often com- 


posed of several heterogeneous three-digit industries, each containing only 


a few firms) as well as different production characteristics between industries. 


Since computer processing costs exceeded one dollar per firm, calculations 
were restricted to four reasonable sized industries that have different pro- 


duction processes, in order to restrain costs within reasonable bounds. 
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Table 1 


Sample Composition 


Industry Years Firms Observations 
Steel 1949-1967 (19) 23 437 
Retail 1949-1966 (18) 19 342 
Petroleum 1950-1967 (18) 22 396 
Machinery 1950-1967 (18) 41 738 

Total 105 1913 


We thus have data beginning in 1949 and ending in 1967 for 105 firms. 
A total of 1,913 observations on investment are to be explained by corres- 
ponding data vectors for sales and gross fixed assets. 

Since interest may attach to the individual mitcro results, these, 
together with their industry macro regressions, have been reported in 
Appendix Table 1-A. They bear a strong resemblance to similar firm re- 


gressions reported by other investigators. 


A. Model Assumptions - B-Weights and Proportions 


One basic assumption of the theorem is that proportions resemble B- 
weights. By calculating proportions as the sample period sum of each firm's 
sales or assets divided by the corresponding sample aggregate sum, various 
comparisons with estimated B-weights are possible. One comprehensive com- 
parison is shown in Tabte 2-I which records the simple regression of the 
B-weights on proportions, with and without the intercept. Since the basic 
hypothesis is homogeneous, the most pertinent regression for each industry 


pair is the one that has been forced through the origin. 


=19= 


For some reason, the regressions for asset parameters are very close 
or within plausible range of the slope value of unity required by the hy- 
pothesis in every case, although the hypothesis is refuted for each sales 
coefficient except for steel. Considering the extent of collinearity in 
all but steel (on which a table will be presented shortly) the net outcome 
is mixed, but not unsatisfactory. Starting from the initial extreme prop- 
osition that no economic content can be read into the auxillary regression 
equation parameters, we have come a long way. 

A related measure is shown in Table 2-II. The average absolute dif- 
ference between B-weight and proportion divided by the average corresponding 
B-weight or Brenantionn is a sensible normalization for comparative pur- 
poses. The steel industry as before shows the smallest relative difference 
while Retail is the worst. Petroleum is fair for both coefficients while 
Machinery does poorly on sales and adequately on assets. Order of mag- 


nitude zero is good and order of magnitude unity or larger is bad according 


to the basic hypothesis. 


* 
These averages are identical since each series sums to unity. 


Table 2 


I. Regression of B-Weight on Proportion 


Sales Assets 
Industry Slope Intercept Slope Intercept 
Steel Coeff. 1.0575 -0.0025 0.8356 0.0071 
T-stat (31.8271) (-0.9767) (31.3322) (3.0041) 
Coeff. 1.0391 forced through 0.8747 forced through 
T-stat (37.9362) origin (32.1531) origin 
Retail Coeff. -0.3114 0.0690 0.6174 0.0201 
T-stat (-1.1032) (2.7318) (2.0663) (0.8785) 
Coeff. 0.1421 forced through 0.7974 forced through 
T-stat (0.5339) origin (3.6911) origin 
Petroleum Coeff. -0.1220 0.0510 1.4166 -0.0189 
T-stat (-0. 7194) (4.2623) (15.4912) (-3.1438) 
Coeff. 0.3439 forced through 1.2182 forced through 
T-stat (1.9665) origin (15.4298) origin 
Machinery Coeff. 0.3136 0.9167 1.0240 -0.0006 
T-stat (1.6463) (2.1568) (11.0235) (-0.1426) 
Coeff. 0.5596 forced through 1.0168 forced through 
T-stat (3.5102) origin (13.2754) origin 


II. Average Absolute Difference Between B-Weight and Proportion Divided 
by Average B-Weight or Average Proportion 


Industry Sales Assets 
Steel 0.158 0.191 
Retail 0.889 0.984 
Petroleum 0.419 0.369 
Machinery 0.793 0.478 


SONS 

Table 3 presents detailed information bearing on another basic assump- 
tion, which is that corresponding B-weights are large relative to noncorres- 
ponding B-weights. For the Steel industry, the assumption held up for all 
23 asset auxillary regressions and for 19 out of 23 sales auxillary regres- 
sions. The Machinery industry had eight and nine failures to conform, or 
success on the order of 80%. A similar average success rate holds up in 
Petroleum, while Retail had under 50% correspondence to this particular 

. * 
assumption. 

A comparison of relative statistical significance between corres- 
ponding and non-corresponding B-weights in Table 4 reinforces the impres- 
sion obtained from comparing only magnitudes. Retail again excepted, 
corresponding B-weights have the greater statistical significance most of 


the time, overwhelmingly so for Steel and Petroleum with somewhat less, 


though clearly evident superiority in Machinery. 


Table 4 


Number of Times Absolute Value of t-Statistic for Corresponding B-Weight 


Exceeds Absolute Value of t-Statistic for Non-Corresponding B-Weight 


No. of Firms Sales Assets 
Steel 23 18 23 
Retail 19 10 13 
Petroleum 22 19 18 
Machinery 41 35 29 


* 
Table 2-A (Appendix) reports the auxillary regressions in full detail. 
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Once again, there appears to be a definite relation to collinearity. 
The quality of the results is inverse to the extent of linear dependence: 
when there is not excessive correlation among the explanatory variables, 


the assumptions of the statistical aggregation model are more nearly 


validated. 
Table 5 
Industry Sales-Asset Simple Correlation 
Steel - 8378 
Retail -9978 
Petroleum -9902 
Machinery - 9696 


The individual t-statistic on the difference between corresponding B's 
and proportions has been calculated using the B weight's standard error 
and treating the proportion as a known parameter. This understates the 
true variability so that the outcome is relatively unfavorable to a find- 
ing of no significant difference. The arithmetic mean of the t statistics 
tends to be small, amounting to less than unity for all sales coefficients 
(and nearly zero in two of the four industries) and about unity for two 
industry asset coefficients and magnitude one-half for the other two. 
Calculated t statistics rank ordered from largest positive to most negative 
appear in Appendix Table 3-A. Casual inspection is enough to indicate that 
there is too much density in the tails for these tabulations to represent 
an identically distributed random drawing from a t distribution with mean 
zero. Retail for both coefficients, and Steel for the sales coefficient 
are comparatively tightly bunched about zero, while the others are much 


less so. One problem with these inferences is apparent from examination 


=939= 


of Appendix Table 2-A showing proportions and auxillary regressions. Good- 
ness of fit is nearly always very high and the corresponding coefficient 
standard errors are very small. As a consequence, even though a visual 

scan reveals that the two series are close to each other most of the time 
(with several glaring exceptions in every industry, to be sure) "significant" 
differences arise with substantial frequency. 

The upshot of this section will now be summarized. Qualified sup- 
port has been found for the assumptions relating corresponding B-weights to 
proportions, and the magnitudes of corresponding and non-corresponding B- 
weights. When there is excessive collinearity among the explanatory macro 
variables, however, the assumptions tend to break down. Yet under this 
very circumstance it is difficult or impossible to obtain sensible macro 
estimates anyway. In short, when estimation feasibility breaks down most, 
so do the aggregation assumptions. Finally, the comprehensive tests sum- 
marizing information about all pairs of proportions and B-weights stand up 


somewhat better than pair-wise tests based on the t-statistic. 


B. Model Predictions: Micro Variances, Macro Variances and Aggregation Gain 


This theory's main innovation is to predict how aggregation gain can 
be expected to occur, and its magnitude. That there is any aggregation gain 
whatsoever until recently would have been a puzzling result of itself. 

Table 6 reports the major empirical results most revealing on this central 
feature. 

Aggregation gain is measured by the reduction of the estimated re- 
gression coefficient variances. In the macro relative to micro estimates, 
the theoretical gain can be calculated in two ways. Assuming, first, that 


proportions are good approximations to the corresponding B-weights, their 
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squared sum designated as H (for reasons given in the next section) shows 
the expected reduction in the macro variance as a consequence of aggrega- 
tion. Its reciprocal is therefore termed the aggregation gain. Second, 
and in parallel fashion, the sum of squared corresponding B-weights and 
its reciprocal provides another relevant estimate of potential aggrega- 
tion gain. Since the postulated empirical relation between the two 
measures has been considered already at length, it is noteworthy that the 
two sums of squares turn out to be highly similar figures. 

Retail bombed out, as in every other aspect it least complied with 
the theoretical assumptions of the basic model. The remaining industries: 

1. All show positive aggregation gain; 

2. All show gains of rather similar magnitude to the theoretical prediction 


I consider these findings to be the most sigrificant quantitative result 


in the paper. 
C. Microproportion Stability 
Some proportions grow rapidly, others decline swiftly, while some 
barely change at all. Since the stability of proportions was assumed at 
the outset, some tests were made to evaluate an assumption we knew would 
be violated in actuality. A straightforward test is presented in Appendix 
Table 4-A from which Table 7 has been extracted. The two quartiles and 
median relative growth rate (defined as the linear trend slope of the proportion 
divided by the ratio of variable means, dP/dt = x/X) summary statistics are pre- 
sented here. The top quartile rate of growth is about 2 1/2 - 4 1/2% 
per annum in most industries while the median trend rate of change is in 
the neighborhood of 1% - 2 1/2%, so that for most firms - not all - rel- 


ative shares did not radically change over the twenty year sample period. 


Table 7 


Quartiles for Relative Rate of Growth of Proportions 


Steel Sales Assets 
First Quartile 0.026 0.030 
Median OOM 0.014 
Third Quartile 0.009 0.008 

Retail 
First Quartile 0.031 0.039 
Median 0.011 0.025 
Third Quartile 0.005 0.007 

Petroleum 
First Quartile 0.934 0.021 
Median 0.024 0.013 
Third Quartile 9.008 0.003 

Machinery 
First Quartile 0.045 0.045 
Median 0.025 0.031 
Third Quartile 0.014 0.013 


== 


5. Empirical Evidence -- All Manufacturing H Indexes 


According to the previously developed theory, expected gains from 
aggregation depend on the number of elements in the aggregate and their 
size distribution. Maximum aggregation gain arises when all elements are 
the same size. This is evident, since the variances of proportions will 
be zero when all shares are identical and the correlative fact that the 
sum of squared proportions (or variances) is nearly one for verv skewed 
shares. The sum of squares for equal shares is 1/N. 

Data on the sum of squared shares exist for manufacturing industries 
because this market parameter has begun to acquire descriptive and analyt- 
ical significance in the study of industrial organization. It has been 
named the H concentration measure or index after the two originators who 
independently promoted it, 0.C. Herfindahl and A.P. Hirschmann. The most 
interesting interpretation is that of M.A. Adelman [1], who shows that the 
H index can be viewed as the weighted average slope of the cumulative con- 
centration curve. In a second interpretation, suppose that there were N 
equal sized firms in an industry; then N turms out to equal 1/H. Adel- 
man suggests that the reciprocal of H can properly be viewed as a "num- 
bers equivalent". 

Since its introduction into industrial organization literature by 
Herfindahl [8], the H index has been calculated for various firm attributes 
for four (and some five) digit industries in a 1963 volume on concentration 
in manufacturing by Ralph L. Nelson [14]. Resort to four digit industries 
is stringent but useful. It is stringent in the sense that such a fine 


subdivision has comparatively few firms in it, useful because one might 


=J9= 

reasonably expect that substantial homogeneity of markets and production 
methods will prevail at this classification level, in contrast to the more 
common econometric category of two-digit aggregates. The exhaustive 
tabulations of Nelson yield several impressions on which we briefly com- 
ment. First, concentration and H measures by value added, by shipments, 
by production manhours and by employees yield highly similar results, so 
that estimates of aearegat iid gain can be equally well derived from any 
of chem” A second impression is that these various measures are quite 
stable over time. These census groupings are much more realistic than the 
arbitrarily formed industries selected out of COMPUSTAT, which, for the 
most part, is composed of large New York Stock Exchange listed firms. 
Several quantitative tahulations bearing aggregation properties in manu- 
facturing deserve explicit presentation. The first is a tabulation of 
ranked H indexes for 1947 shown in the first column of Table 8. While 
these data are now twenty-two years old, they are more complete, but still 
similar to the latest census information available to Nelson for 1954, plus 
annual survey information for 1955, 1956 and 1957. Later tabulations in- 
dicate that these measures are stable according to visual impressions, so 
that broad inferences drawn from 1947 will be reliable, even though some 
industries now have widely different H and concentration indexes than they 
had in 1947. 

An experiment with these data turned out to be a partial success. 
The purpose of the exercise was to determine whether a simple one para- 
meter function like the exponential can be used to derive H indexes from 

te is generally true that establishment based indexes show consider- 


ably less concentration than company data. Thus, subsequent evaluation 
of these tabulations are restricted to the series listed only. 


=90)= 


usually more readily available data. More concretely, if the concentration 
density curve could be adequately represented by an exponéntial density 
function with exponential parameter designated c it is a simple matter 

to use concentration indexes to estimate Ge Furthermore, the sum of 
squares of the density function will then be c/2. If the initial hypothesis 
of equivalency is correct, c/2 will then be equal to the industry H index 
and in addition, estimates of c for the top four, eight and twenty firms 

in each industry will turn out to be approximately the same. While the 
implicit estimates of the squared sum of the hypothesised exponential were 
indeed similar in magnitude to the actual H index, the second implication 
of the exponential form was not supported by the data. For all but a few 
four digit industries, c(4) > c(8) > c(20) where c has been estimated for 


xk 
each of the four, eight and twenty largest firms in Table 8. This result 


*Suppose f(i) = ce * gives the i'th firm's proportion of 
total industry shipments. Then the proportion of total shipments of 
the M largest firms is given by the cumulated density function, 


s(M) = ea)@i ce Lea @ 


Thus, if s(M) is known, the last expression may be solved for c 
to give 


ae 1 
eS Sy 22 Ceaam 


)e 

**ohis table is based on all census industries in Nelson's Table A:1, 
Company Concentration Indexes, Industries, 1947, 1954, 1955, and 1956, 
with the exception of the following which could not be used because suf- 
ficient data were not available, or were not reported to avoid disclosing 
figures for individual companies: 


D2? 3350 265 28240 G2 S331) 3932,. 3334, 3429546 3491/7, Sole 
3553, 3555, 3616, 3821. 


80s 
suggests that typically industry is less concentrated than exponential 
curves derived in this fashion. Yet it is still possible to use concen- 
tration indexes to obtain an approximate notion of potential aggregation 
gain when, as usual, these data are available but H indexes are not. For 
industries with high to medium concentration c(4)/2 provides the best es- 
timate of H (actual H >.07); c(8)/2 is closest to H for medium to low con- 
centration (.02 < actual H < .07) and c(20)/2) is closest to H for the 
least concentrated industries (actual H < .02). While close estimates of 
H were obtained for medium and low concentration industries, the most 
highly concentrated industries usually provided the poorest information 
about H from the concentration indexes, as an examination of the top 15 - 
20 industries in Table 8 indicates. 

H indexes in manufacturing four digit industries have a median value 
of .09 according to Table 8. Translated into the terms of our approach to 
aggregation, the gain in terms of variance reduction from aggregation is 
twelve fold. Fven at the first quartile mark, 

above which lie 25% of the most concentrated industries, a seven fold 

gain will occur from using aggregated data instead of micro information 
directly. At the third quartile, an H index of about .04 implies that a 
twenty-eight fold aggregation gain may be expected. 

To what extent does the existing degree of concentration in Amer-— 
ican industry reduce estimation efficiency? To recall the eariier state— 
ment of the problem, a highly skewed size distribution of elements in an 


aggregate reduces the beneficial aggregation effects of collapsing 
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=3= 
parameter variances. Subtracting H's minimum theoretical value 1/N 


(which occurs when all elements are equal) from actual H offers a simple 
measure of this loss. This difference, H - 1/N, also has significance 
for the analysis of imatstrial organization. The results are arrayed in 
Table 9. The median within manufacturing is about .07, which is almost 
the same as the value of H itself. While the H - 1/N distribution is 
somewhat more concentrated than the H distribution and some rearrangement 
in the rank orderings do occur, the two distributions strongly resemble 
each other. Since for all but a handful of industries 1/N tends to be 
negligible, we may conclude that it is not fewness of firms but the skew- 
ness of their size distribution that limits potential aggregation gain. 

The stability of concentration measures over time depends on which 
measure has been selected according to Table 10. H indexes show greater 
sensitivity than the more standard concentration index. Noticable insta- 
bility is apparant for one fifth of all manufacturing industry total ship- 
Ments between 1947 and 1956 for the H index, while a mere three out of 113 
industries appeared to deviate significantly from the base share held by 
the four largest firms over the same pemiodel 

In studying aggregation, the more sensitive H index is the es 
pertinent magnitude. While far from ideal, the overall picture for sta- 
bility is quite satisfactory. A batting average of .800 over a decade 
(what happened in between we cannot say) is good enough to warrant broad 
confidence to a basic presumption of stability. While these data are more 
inclusive than the micro data previously analyzed, they are not as powerful, 

aN quick subjective examination reveals that the industries 2031, 2043, 
ME 5 2035 AEDs all, AAs, sowils spools syn S75. ssl. sissy SVG}, SEYS7, 
3576, 3613, 3631, 3717, 3722, and 3861 had marked changes in their H index 


between 1947 and 1956, while industries 2031, 2045 and 3613 had striking 
changes in the concentration index for the four largest firms. 


c0T0°Oo 
cE€TO°O 
64T0°O 
09T0°0 
L610°0 
1Z220°O 
2L€20°O 
0920°0 
6620°0 
€TC0°O 
TLE0°O 
€€%0°0 
S9%0°0 
16%0°O 
TTS0°O 
7€S0°0O 
€%S0°0 
9290°0 
6S90°0O 
8210°0O 
4S520°0 
7880°0 
8€60°0 
€960°0 
O€TT°O 
LBIT°O 
€€ct°o 
OL2e1°O 
B8ET°O 
L24T°O 
26%T°O 
929T°O 
TO6T°O 
4202°0 
6%€2°0 
8992°0 
Wem ea oe 
(N/T - H) 


Tcee 
1292 
TESet 
68%E 
T6SE 
99SE 
€682 
9GGE 
66CE 
Cee 
ECC 
6LG¢ 
TE9€e 
C9CE 
S702 
CS62 
Tl8€ 
€2BC 
Tlé2 
T1102 
6SEE 
£902 
CULE 
2182 
ET9E 
2902 
€6St 
c99E 
T99E 
€%02 
T22e 
269E 
2502 
ST9E 
TI%€ 
elce 
CU9E 
AYLSNONI 


sOTO°0 
SET0°O” 
0S10°0 
%110°O 
20z0°0 
6220°0 
1%20°0 
¥920°0 
80€0°0 
¥Z2€0°0 
96€£0°0 
9€40°O 
1290°0 
20S60°0 
81S50°0 
9€S0°0 
4SS0°0 
6S90°0 
%0L0°0 
8ZL0°0 
G€80°0 
1680°0 
6%60°0 
TLOT°O 
ZETT°O 
902T°0 
%EZT°O 
o8zt°o 
€6ET°O 
S»4T°O 
O1ST°O 
089T°O 
%E6T°O 
TEEZ°O 
L€%2°0 
ae i a he a 
KEKE KK 


(N/T = H) AYLSNONI 


AS3VWws O1 1S39UV7 WOUS “IN 


6 318Vi 


TLO2 
2802 
2E9? 
€CLE 
CoGE 
T19€ 
Tote 
Tt02 
EBS 
T9SE 
S622 
Y9OGE 
S8SE 
Tele 
T29€ 
€62E 
ISLE 
c92E 
€S2E 
TE9VE 
T92E 
SIGE 
2602 
elee 
T69€ 
27BE 
€Ll0e 
IT6E 
S¢cB8e 
S802 
STLE 
LULE 
Tile 
TISE 
8682 
¥99E 
CBSE 


41 - H) ALTANVND 3HL 


TZ210°O 
6€T0°O 
GSTO0°O 
ZL10°0 
1220°0 
9¢€20°0 
GS20°0 
6820°0 
€TEO°O 
€£E0°0O 
61%0°0 
6S40°0 
98%70°0 
€0S0°0 
02S0°0 
6€S0°0 
€190°0 
9Sg¢90°0 
€clL0°O 
€S20°0 
T780°O 
€€60°0O 
€S60°0 
CItl°?o 
ceL£TtTt°O 
T12zT°o 
2£921T°0 
GIET°O 
66ET°O 
09¥7T°0O 
G7ST°0O 
Z2O2T°O 
¥B86T°O 
€7EZ2°O 
12Sc°0 
WOR tk 
et 2 
(N/T -— H) 


£94€ 
T¥SGE 
769 
I6EE 
Thee 
££02 
199€ 
CLEC 
T1402 
£%6E 
68GE 
T8SE 
"92E 
£6%E 
1162 
1992 
COLE 
T69€ 
I2Lle 
22G2 
622E 
ISEE 
9LGE 
YIT9E 
GLE 
CHCE 
S2vE 
CLGE 
cele 
T10¢ 
TLGE 
C9GE 
TOBE 
Tz0e 
To12 
1S9€ 
CGEE 
AUWLSNGNI 


SJINVdWOD TWNGTAIONI YOS SIYNDIS ONISOVISIG GIGAV OL GTSHHLIM — exe eee 


0900°0 eS22 42100°0 1922 


910°0O £12°0 Ly>61 e802 


992°0 6S8°0 9S61 
2ST°O 00L°0 L%61 €lO02 
€10°O T8Tt°o 9S61 
110°0 OLT°O L461 1202 
291°O 029°0 9SG6l 
L4T°O 289°0 1961 €902 
5ST°0 1Ss9°0 9S61 
LLT°O 269°0 L961 2902 
€8T°O 602°0 9S6I 
461°O Z21L°0 L561 2502 
€l2°0 SL8°0O 9S61 
6S0°0 €0%°0O Ly61 G»02 
Pets fF | 8S8°0 9G6I 
09T°O L>4L°0 L461 € 702 
Z2¥0°0 ¥SE°0O 9S61 
Z2£0°0O 882°0 L>6T 1402 
€£0°O €62°0 9S61 
4Z20°0 €£92°0 L461 €€02 
¥10°0 8S4°0O 9S6T 
O0€0°O 182°0 L461 TE02 
¥S0°O Z8€e°O 9S6l 
€10°0 1247°0 L461 | @ Ger 
X JONI 
H °3NO9 YVAIA AYLSNONI 


9S61 ONV 2461 *X30NI H ONV 
SWuId 1839¥V7 UNOS JO SINSWdIHS AYISNGONI WwlOl JO NOTLIVYS 


,| 2 evi 


€10°0 
910°O 


9¢€0°0 
090°0 


090°0 
S80°0 


610°0 
220°0 


290°0 
6%0°0 


160°0 
£20°0 


AEs Se 
600°0 


€T0°O 
L00°0 


L€0°O 
6€0°0 


EHH 
c92°0 


€L0°0 
1S0°0 


86T°O 
9%2°0 


c80°0 
60T°O 


ooT°o 
ISsTt°O 


620°0 


¥9T°O 
LL1°O 


8TE°O 
T4¥¥°0 


01¥°0O 
96%°0 


L12°0 
912°0 


629°0 
Lee°o0 


9€S°0 
€0S°0 


°V°N 
%€T°O 


T6T°O 
92170 


92€°0 
SOE°O 


°v°N 
828°0 


984°0 
€0%°0 


€18°oO 
€06°0 


28%°0 
06S°0 


81S°0 
€7L°O 


0S2°0 


9S61 
L761 


9S6I 
LY6T 


9S61 
L461 


9S6T 
Ly¥61 


9S61 
Ly6T 


9S61 
L561 


9S61 
2461 


9S6T 
L461 


9S61 
L%6T 


9S61 
L461 


9S61 
L»61 


9S61 
L»6T 


9S61 
L561 


9S61 
L>61 


9S61 


1192 


1992 


@2S2¢ 


2L42 


G622 


tLee 


1922 


2S2e 


ECC 


ToI2 


t212 


Tlt2d 


2602 


S802 


660°0 


170°0 
S€0°0O 


eh HH 
€60°0 


6ST°O 
€L1°0 


920°0 
sc0°O 


622°0 
¥872°0 


€9T°O 
¥L1°O 


810°0O 
£€90°0 


990°0 
SS0°O 


782°0O 
€8c°O 


c£0°0O 
Sz0°0O 


€€0°O 
T€0°O 


SS2°0 
981T°O 


890°0O 
€210°O 


6ST°O 
Tst°o 


20S°0 


O2E°O 
162°0 


°V°N 
S0S°0 


S8S°0 
929°0 


T62°0 
2422°0 


971°O 
908°0 


Zsl°0O 
¥9L°0 


6£%°0 
S17°0 


L1€°O 
ZL€°O 


%18°0O 
208 °0 


282°0 
L€Zz°0 


€S2°0 
812°0 


692°0 
€82°0 


0Ss¥°0 
8e4°O 


92L°0 
669°0 


L96T 


9S61 
L%6T 


9S6T 
L761 


9S61 
L761 


9S61 
L»61 


9S61 
L¥61 


9S6T 
L461 


9S6T 
L761 


9S61 
L461 


9S61 
L961 


9S61 
L761 


9S61 
L761 


9S61 
L¥6T 


9S61 
L»61 


9S61 
L461 


eSce 


T92E 


622E 


T22e 


TUTE 


T20€ 


Tl0€ 


7S6¢ 


Tt62 


8682 


€682 


7EB?C 


Sc8¢ 


£282 


2182 


880°0 
TIT°O 


OIT°o 
ooT°o 


610°0O 
€20°0O 


HK 
9%70°0 


c20°0 
oTo°o 


9TT°O 
ett°o 


6€0°0O 
6S0°0 


0%T°O 
cet°o 


T€2z°0 
O£ET°O 


€92°0 
162°0 


080°0 
690°0 


9S60°0 
290°0 


TTT°O 
701°O 


B8IT°O 
22t°o 


£80°0 


92S°0 
009°0 


91S°0 
TEs°O 


€22°0 
S2z°0 


°V°N 
0SseE°O 


€S2°0 
T9T°O 


62S°0 
92S°0 


c2£°O 
T24°O 


18S°0 
72S°0 


299°0 
€9S°0 


S1B°0O 
748 °O 


18%°0 
8t4°0 


8SE°0 
%LE°O 


2£8S°0 
7SS°0 


8%S°0 


L£1S°0 


2£0S°0 


9S61 
2461 


9S6T 
Ly6T 


9S6I 
L>6T 


9S61 
L¥%61 


9S61 
Ly>6T 


9S6T 
Ly6T 


9S61 
L461 


9S6T 
L>6T 


9S61 
Ly6T 


9S61 
Ly61 


9S61 
Ly¥6T 


9S61 
Ly61 


9S6T 
L461 


9S61 
L761 


9S61 


ISEE 


eee 


€CEE 


CCEE 


Ieee 


cTEe 


€62€£ 


Z62E 


Glee 


clcee 


99CE 


€9CE 


292t 


T9ZE 


910°0 


we 
662°0 


0z0°0O 
B810°0 


1¥70°0 
S90°0 


910°0 
980°0 


OT0°O 
L10°0 


L00°0 
c10°0 


€10°0 
0s0°0 


60T°O 
6ET°O 


Ti2°0 
792°O 


820°0 
c£0°0 


€Ss0°0 
TL0°0 


S£0°0 
120°0 


OcT°O 
O0oT°O 


Ett ts) 
Pret S| 


6L1°0 


618°0O 
S18°0 


L6T°O 
661°O 


S9E°O 
%2%7°O 


Z1S°0 
81S°0 


Tst°o 
T02°0 


9¢c1°0 
zlt°o 


70%°0O 
S%E°0 


€€S°0O 
€S9°0 


€28°O 
9LL°0O 


9S2°0 
712°O 


L1l€°0 
Lo%°0 


B0€°O 
%€e°O 


£29°0O 
81S°0 


c68°O 
0%76°0 


L9>6T 


9S6T 
L461 


9S6T 
L961 


9S61 
L461 


9S61 
261 


9S61 
L>6T 


9S61 
LY61T 


9S61 
Ly>6T 


9S61 
L%6I 


9S6T 
L46T 


9S61 
L761 


9S61 
L>61 


9S61 
L>6T 


9S6I 
Ly>61 


9S61 
L461 


TESE 


TIGE 


%649E 


€64E 


—T64E 


6B¥E 


€9%E 


TEVE 


GCE 


TIVE 


66EE€ 


COLE 


T6CE 


6SEE 


7GEE 


KKH 
KKEE 


oe EE 
290°0 


9%0°0 
870°0 


090°0 
60T°0 


SEE H 
oet°o 


€9T°O 
¥L1T°O 


610°0 
720°0 


s80°0 
%60°0 


eeERK 
8%0°0 


060°0 
6L1°O 


Pr ES 
%€0°0 


€S0°0 
££0°O 


T20°O 
220°0 


S10°0 
L10°O 


ST0°0 


668°0 
69L°0 


°V°N 
96€°0O 


€7€°O 
OSE °O 


€€4°O 
9€S°0 


°V°N 
26L°0 


822°0 
2£89°0 


712°O 
S%2°0 


€€s°O 
€9S°0 


°y°N 
L8E°O 


SL¥°O 
929°0 


°V°N 
T62°0 


¥LE°O 
082°O 


O12°O 
»22°0 


"9T°O 
£6T°O 


osTt°o 


9S61 
L761 


9S61 
Ly%6T 


9S61 
L961 


9S6T 
Ly61 


9S61 
L>61 


9S61 
L¥6T 


9561 
Ly6T 


9S61 
Ly>6T 


9S61 
L>6l 


9S6I 
Ly>6l 


9561 
L+61 


9S61 
L¥6T 


9S61 
L461 


9S61 
L461 


9S61 


e8Se 


U8se 


6LSE 


9LSE 


ZLSE 


TLSE 


99SE 


S9GE 


99SE 


C9SE 


T9SE 


GSE 


e7S€ 


T9SE 


8ST°O 


T€0°0 
920°0 


SE8RR 
BEKKE 


LET°O 
84T°O 


8T1°O 
610°0 


780°0 
ss0°0O 


89T°O 
012°0 


780°0O 
SsTT°0O 


2s0°0 
6TT°O 


KEES 
SEKEE 


610°0 
920°0 


S21t°O 
9€T°O 


ST0°0 
220°0 


8L10°0 
L£%0°O 


2%0°0 
870°0 


0 of Bal 6) 


662°0 
892°0 


€26°0 
916°0 


629°0 
999°0 


€0S°0 
S0¥°0O 


%€4°O 
SSE°O 


6€L°0O 
4£2L°O 


189°0 
£8S°0 


T9E°0 
909°0 


268°0O 
898°0 


981°0 
L12°0 


129°0 
L19°0 


€81°O 
%€2°0 


22%°0 
€€e°O 


BEE °O 
6l€°O 


L961 


9S61 
L761 


9S61 
L>61 


9S6T 
L>6l 


9S61 
L461 


9S61 
L¥61 


9S6T 
L961 


9S6T 
2961 


9S61 
L%6T 


9S61 
L>61 


9961 
L561 


9S6T 
L961 


9S61 
L761 


9S61 
L761 


9S61 
L761 


299E 


T99E 


IS9E 


T99E 


TE9E 


T29E 


SI9E 


VI9E 


€19e 


c19€ 


TI9E 


€6SE 


16SE 


68SE 


S8SE 


790°0 
190°0 


LEet°o 
cet°o 


PEELE | 
6S0°0 


182°0 
T02°O 


92T°0 
e21°o 


€90°0O 
890°0 


22208 
80T°O 


STt°o 
8ST°O 


880°0 
€60°0 


€62°0 
69T°O 


20888 
o9T°O 


Seee 
s0z°o 


eSt°o 
8TT°O 


S2ER 
KEKE 


KEES 


814%°O 
TOE°O 


S8S°0 
119°0 


°V°N 
S0%°0 


029°0 
019°0 


7SS°0 
28S°0 


919°0 
e24°0 


°V°N 
2£5S°0 


129°0 
212°0 


81S°0 
Te€s°O 


7€B°O 
899°0 


°V°oN 
09S°0 


°“V°N 
29L°0 


S%L°O 
919°0 


956°0 
SS6°0 


°V°N 


9S61 
L96T 


9S61 
L461 


9S61 
L76T 


9S61 
L761 


9S61 
L£%61 


9S61 
L£%61 


9S6T 
L461 


9S61 
L461 


9S61 
L261 


9S6T 
L¥61 


9S61 
L>61 


9S61 
L>6T 


9561 
L561 


9S61 
L761 


9S61 


€76€ 


VI6E 


TLBE 


T9BE 


2¥8E 


IGLE 


CVLE 


cele 


Tele 


Ltle 


STLE 


c69€ 


T69€ 


999E 


JIEVIIVAY LON = °V°N 


S3INVdWOD TWNGIAIONI 
YO4d SIYNIIS ONISOIDSIC GIOAV OL QV3IHHIIM - xeaex 


—32- 


since the stability in an industry aggregate does not preclude shift in 


the underlying individual distributions. 
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6. Heterogeneous Parameters amd Research Strategy 


A complementary statistical aggregation approach starts with the 
assumption of heterogeneity, but concludes that structural simplification 
through clustering of similar objects can lead to increased understanding 
and manageability, although at some cost in precision. Walter Fisher [4] 
provides operational solutions to this problem through statistical decision 
theory. 

The present treatment of aggregation uses a quadratic cost function 
i.e. least squares, on the aggregate relation whose implications were long 
ago illuminated by Theil. While I have nothing to add to Theil's theory 
under conditions where micro paramaters differ, the interpretation given 
here to the B-weights has a definite bearing on this question. 

When the B-weights for corresponding parameters resemble proportions, 
and noncorresponding parameter aggregation weights are negligihle term-by- 
term, ordinary least squares parameter estimates of macro relations will 
have desirable properties, even when the micro parameters are different. 
In this exposition, my debt to Theil should be evident, even though it is 
my inclination to emphasize the economic substance of B-weights and hence 
the feasibility of using macro relations effectively to predict and study 
behavior. 

Assume the following conditions hold: 


1. Corresponding Big weights are proportions i.e. 


2. Noncorresponding B weights are zero. 
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In these circumstances the micro predictions and macro predictions are 
the same. As Theil characterizes this sort of outcome, there is no contra- 
diction between the micro relations and the macro relation. The forecast- 
ing benefits from relying on macro equations are evident, even when these 
conditions hold only approximately. Much interest thus attaches on the 
stability of shares through time for members of economic population. The 
law of proportional growth analyzed for instance by Simon and }onini [17] 
suggests that stability in shares is not implausible among economic popula- 
tions. The matter is fundamentally an empirical one that has been examined 
here for several subsets of manufacturing data. 

Furthermore, the optimal properties of least squares estimation apply 
to the auxillary equation parameter estimates as an estimator of propor- 
tions under the conditions stated above. When shares are changing, ordinary 
least squares provides an efficient estimate of the average proportion over 
the sample period. 

Efficient predictions can be obtained when shares are stable and 
parameters differ or when parameters are similar and shares are not. Of- 
ten one, the other or both sets of conditions will hold - this explains 


much of the modest success that highly aggregative macro models have had. 
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Many failures have their origins in the failure of one or both conditions. 
The ideal situation, of course, is one where underlying behavior is rel- 
atively homogeneous so that compositional aspects of the data universe be- 
come secondary. But macro model building strategy can and should proceed 
on several fronts simultaneously, relying on aggregates (and hence share 
stability) more heavily in the short term while searching for significant 
behavioral homogeneities in the longer run. This observation actually 
characterizes much econometric model building activity of the past twenty 
years, beginning with the simplest three to nine equation Keynesian models. 
Research activity then expanded in several directions to disaggregate be- 
havior equations that obviously contained disparate elements. Thus aggre- 
gate investment was separated into inventory, plant, equipment and resi- 
dential construction. Total consumption has by now come to be divided 
into at least three components - durables, nondurables and services - in an 
effort to isolate significant differences in underlying behavior. Dif- 
ferential industry behavior has led to the proliferation of economic sec- 
tors. When the disaggregated data can be obtained, it becomes possible 

to weigh aggregation gains arising from grouping similar Sere units 
against the losses from subsuming too much disparate behavior in one 
relationship, and the perils or benefits inherent in shifting or stable 
shares and, more generally, aggregation weights. 

Material in Section 5 is fragmentary, The empirical propositions 
held up well in numerous situations, poorly in others. While clear 
advan*ages accrue from proper aggregation, one noticable instance that 
failed especially badly was Retail Trade. The main reason seemed to 
be extreme collinearity. In this case no aggregation gain occured: in 
fact, if the calculations are to be believed, there were actual aggrega- 


tion losses. In instances where collinearity is pervasive, we may 
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speculate that the only possibility for successful estimation is to rely 
more heavily on disaggregated data. This is neither necessary not suf- 
ficient to assure success. It is a conjecture based partly on this study 
and other econometric work with micro data. 

Another reason that micro data is desirable is fundamental to the 
acquisition of a solid basis for using macro equations, namely, the un- 
derstanding how much heterogeneity exists, where the main tool would be 
some form of covariance analysis, and in what catagories {t is most 
prevalent. Thus sensible research strategy requires simultaneous ex- 
ploration of micro and macro data; micro data to test hypotheses and ex- 
plore homogeneity, macro data for statistical efficiency and as a way to 
hold expenses down to tolerable levels. 

Intensive analysis of aggregation was restricted to manufacturing, 
using some concocted two digit industries from COMPUSTAT for which all 
possible parameters were calculated that bear on the theory and "real" 
four digit cencus data using the sum of squared shares , or H index as 
a direct indicator of the sort of aggregation gains that can be expected 
from such industries in future investigations. With the tools developed 
in Section 6 for translating concentration indexes into H indexes, per- 
sonal income distribution data should be explored. One may speculate that 
H indexes are smaller for personal income - income is more evenly dis- 
tributed than are firm sizes - so that potential aggregation gain is even 
greater in the study of consumption. 

It should now be more possible to explore wider areas of economic 
behavior in an operational framework that will enable econometricians 
more readily to adopt a research strategy by choosing aggregation levels on 


the basis of systematic information rather than hunch or random availability. 
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7. Related Literature and Acknowledgements 


Zellner [ 22] was first to point out that the random coefficient model 
provided a natural approach to the problem of consistent aggregation. He 
demonstrated that least squares estimation in the random coefficient case, 
where data had been aggregated, was unbiased. 


Theil [20] has postulated a random coefficient model with identical 


properties to those we have adopted. In the regression equation: 
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where Shea is the micro covariance among micro parameters Bos and Bait 


The variance of Bo can thus be written: 
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where Coe is the coefficient of variation of Xoa during the t period. 


—-38- 


Theil concludes that as the numbers in the aggregate N increase, the 
variance of the estimated parameters will tend to zero. His result is the 
same as mine, but there are differences in approach and some possible 
contradictions that arise. First, Theil's macro coefficient Bo» 
is not an estimated coefficient in the standard sense: it cannot be 
estimated in the absence of the micro parameters themselves. This may be 
seen from the fact that the parameters are strictly functions of time (in- 
volving both Biot and X50) in ways that the more conventional macro para- 
meters I have postulated are not. Second, I have shown that, on admittedly 
extreme assumptions, it is possible for the variance of the macro parameters 
not to diminish as N increases. The same is possible in Theil's presenta- 
tion of the problem, but the precise conditions relating to coefficients 
of variation for the explanatory variables do not seem to be related to 
my requirements. This possible contradiction is not readily explicable. 

An advantage of my treatment, in addition to its being directly re- 
lated to standard least squares estimation, is that the macro parameters 
variance is closely though approximately related to important---and often 
available---information on size distributions of firms, individuals, or 
incomes for instance. The relative stability of these distributions and 
their properties provide genuine insight into when, and how much, ag- 
gregation gain can arise from enlarging the population aggregate. Coef- 
ficients of variation do not have the same intuitive significance, are 
less readily available, and we know less of their stability over time. It 
is reassuring, however, that two rather different approaches retin 


similar assumptions, do reach the same conclusions. 
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APPENDIX 


Table 1-A Investment Regressions - Micro and Macro 


Table 2-A Estimated Coefficients of Auxillary Equations and Firm 


Proportions in Industry Totals, Ranked by Proportions 
in Industry Sales 


Table 3-A Rank Ordering of t-Statistics for (Proportion - B-weight) 


Table 4-A Time Trend Regressions for Firm Proportions, Ranked by Size 


of Trend Coefficient 


Table 5-A List of Firms and their COMPUSTAT Numbers and ID Numbers 


Note: 


The computer used for calculating summary statistics in this 
study did not round off, but simpl; truncated all figures 
after the ones printed. Therefore, the means and other 
summary statistics reported here cannot be duplicated exactly 
by using the numbers in these tables or in Table 3. 
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